Direct identification vs. correlated models to process acoustic and articulatory informations in automatic speech recognition
نویسندگان
چکیده
Our work deals with the classical problem of merging heterogenous and asynchronous parameters. It's well known that lips reading improves the speech recognition score, specially in noise condition ; so we study more precisely the modeling of acoustic and labial parameters to propose two Automatic Speech Recognition Systems : a Direct Identi cation is performed by using a classical HMM approach : no correlation between visual and acoustic parameters is assumed. two correlated models : a master HMM and a slave HMM, process respectively the labial observations and the acoustic ones. To assess each approach, we use a segmental pre-processing. Our task is the recognition of spelled french letters, in clear and noisy ( coktail party ) environments. Whatever the approach and condition, the introduction of labial features improves the performances, but the di erence between the two models isn't enough su cient to provide any priority.
منابع مشابه
Articulatory-Acoustic-Feature-based Automatic Language Identification
Automatic language identification is one of the important topics in multilingual speech technology. Ideal language identification systems should be able to classify the language of speech utterances within a specific time before further processing by language-dependent speech recognition systems or monolingual listeners begins. Currently the best language identification systems are based on HMM...
متن کاملAMULET: automatic MUltisensor speech labelling and event tracking: study of the spatio-temporal correlations in voiceless plosive production
Speech production is a complex process relying on coordinated gestures, but the acoustic signal does not depict its underlaying organization. Accepting that articulatory gestures are directly recognized through the coarticulation process, our proposal is to investigate the correlations between acoustic and articulatory informations and to assess gestural phonetic theory. We present here the fra...
متن کاملIntegrating Articulatory Features into Acoustic Models for Speech Recognition
It is often assumed that acoustic-phonetic or articulatory features can be beneficial for automatic speech recognition (ASR), e.g. because of their supposedly greater noise robustness or because they provide a more convenient interface to higher-level components of ASR systems such as pronunciation modeling. However, the success of these features when used as an alternative to standard acoustic...
متن کاملCombining acoustic and articulatory feature information for robust speech recognition
The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...
متن کاملInvestigation of Language Structure by Means of Language Models Incorporating Breathing and Articulatory Noise
In our experiment we used a bigram language model and a standard speech recogniser to test if linguistic information is related to the position of silence, articulatory noise, background noise, laughing and breathing in spontaneous speech. We observed that for silence and articulatory noise the acoustic modelling is more important than linguistic information represented in the bigrams of a lang...
متن کامل